Fix IndexMetaData loads after rollover by nik9000 · Pull Request #33394 · elastic/elasticsearch

nik9000 · 2018-09-04T19:11:59Z

When we rollover and index we write the conditions of the rollover that
the old index met into the old index. Loading this index metadata
requires a working NamedXContentRegistry that has been populated with
parsers from the rollover infrastructure. We had a few loads that didn't
use a working NamedXContentRegistry and so would fail if they ever
encountered an index that had been rolled over. Here are the locations
of the loads and how I fixed them:

IndexFolderUpgrader - removed entirely. It existed to support opening
indices made in Elasticsearch 2.x. Since we only need this change as far
back as 6.4.1 which will supports reading from indices created as far
back as 5.0.0 we should be good here.
TransportNodesListGatewayStartedShards - wired the
NamedXContentRegistry into place.
TransportNodesListShardStoreMetaData - wired the
NamedXContentRegistry into place.
OldIndexUtils - removed entirely. It existed to support the zip based
index backwards compatibility tests which we've since replaced with code
that actually runs old versions of Elasticsearch.

In addition to fixing the actual problem I added full cluster restart
integration tests for rollover which would have caught this problem and
I added an extra assertion to IndexMetaData's deserialization code which
will trip if we try to deserialize and index's metadata without a fully
formed NamedXContentRegistry. It won't catch if use the wrong
NamedXContentRegistry but it is better than nothing.

Closes #33316

When we rollover and index we write the conditions of the rollover that the old index met into the old index. Loading this index metadata requires a working `NamedXContentRegistry` that has been populated with parsers from the rollover infrastructure. We had a few loads that didn't use a working `NamedXContentRegistry` and so would fail if they ever encountered an index that had been rolled over. Here are the locations of the loads and how I fixed them: * IndexFolderUpgrader - removed entirely. It existed to support opening indices made in Elasticsearch 2.x. Since we only need this change as far back as 6.4.1 which will supports reading from indices created as far back as 5.0.0 we should be good here. * TransportNodesListGatewayStartedShards - wired the `NamedXContentRegistry` into place. * TransportNodesListShardStoreMetaData - wired the `NamedXContentRegistry` into place. * OldIndexUtils - removed entirely. It existed to support the zip based index backwards compatibility tests which we've since replaced with code that actually runs old versions of Elasticsearch. In addition to fixing the actual problem I added full cluster restart integration tests for rollover which would have caught this problem and I added an extra assertion to IndexMetaData's deserialization code which will trip if we try to deserialize and index's metadata without a fully formed `NamedXContentRegistry`. It won't catch if use the *wrong* `NamedXContentRegistry` but it is better than nothing. Closes elastic#33316

elasticmachine · 2018-09-04T19:12:00Z

Pinging @elastic/es-core-infra

nik9000 · 2018-09-04T19:12:43Z

I've not run this through much testing locally but i'm opening it so I can get CI to run. I'll remove the WIP label after I'm more sure of it.

nik9000 · 2018-09-04T19:44:27Z

@elasticmachine, retest this please.

nik9000 · 2018-09-04T19:45:44Z

server/src/main/java/org/elasticsearch/gateway/TransportNodesListGatewayStartedShards.java

                    // sometimes the request comes in before the local node processed that cluster state
                    // in such cases we can load it from disk
-                    metaData = IndexMetaData.FORMAT.loadLatestState(logger, NamedXContentRegistry.EMPTY,
+                    metaData = IndexMetaData.FORMAT.loadLatestState(logger, namedXContentRegistry,


While I'm 99% sure this change is correct, I don't know of a way to trigger this code locally.

I'm not sure I follow. What do you mean with triggering code locally?

This branch looks like the kind of thing that won't kick in much. Do we have a test that'll call it?

I think we only have general IT tests ci runs. This class is not wel tested :(

I stuck an exception there and found RecoveryFromGatewayIT.testStartedShardFoundIfStateNotYetProcessed when I ran the test. So it looks like we do cover it. So I feel good!

nik9000 · 2018-09-04T19:46:01Z

server/src/main/java/org/elasticsearch/indices/store/TransportNodesListShardStoreMetaData.java

                // sometimes the request comes in before the local node processed that cluster state
                // in such cases we can load it from disk
-                metaData = IndexMetaData.FORMAT.loadLatestState(logger, NamedXContentRegistry.EMPTY,
+                metaData = IndexMetaData.FORMAT.loadLatestState(logger, namedXContentRegistry,


While I'm 99% sure this change is correct, I don't know of a way to trigger this code locally.

nik9000 · 2018-09-05T13:45:57Z

Marking team-discuss to help find a reviewer.

bleskes

LGTM. Good one.

bleskes · 2018-09-06T20:35:40Z

server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetaData.java


        @Override
        public IndexMetaData fromXContent(XContentParser parser) throws IOException {
+            assert parser.getXContentRegistry() != NamedXContentRegistry.EMPTY


bleskes · 2018-09-06T20:36:44Z

server/src/main/java/org/elasticsearch/gateway/TransportNodesListGatewayStartedShards.java

                    // sometimes the request comes in before the local node processed that cluster state
                    // in such cases we can load it from disk
-                    metaData = IndexMetaData.FORMAT.loadLatestState(logger, NamedXContentRegistry.EMPTY,
+                    metaData = IndexMetaData.FORMAT.loadLatestState(logger, namedXContentRegistry,


I'm not sure I follow. What do you mean with triggering code locally?

When we rollover and index we write the conditions of the rollover that the old index met into the old index. Loading this index metadata requires a working `NamedXContentRegistry` that has been populated with parsers from the rollover infrastructure. We had a few loads that didn't use a working `NamedXContentRegistry` and so would fail if they ever encountered an index that had been rolled over. Here are the locations of the loads and how I fixed them: * IndexFolderUpgrader - removed entirely. It existed to support opening indices made in Elasticsearch 2.x. Since we only need this change as far back as 6.4.1 which will supports reading from indices created as far back as 5.0.0 we should be good here. * TransportNodesListGatewayStartedShards - wired the `NamedXContentRegistry` into place. * TransportNodesListShardStoreMetaData - wired the `NamedXContentRegistry` into place. * OldIndexUtils - removed entirely. It existed to support the zip based index backwards compatibility tests which we've since replaced with code that actually runs old versions of Elasticsearch. In addition to fixing the actual problem I added full cluster restart integration tests for rollover which would have caught this problem and I added an extra assertion to IndexMetaData's deserialization code which will trip if we try to deserialize and index's metadata without a fully formed `NamedXContentRegistry`. It won't catch if use the *wrong* `NamedXContentRegistry` but it is better than nothing. Closes #33316

6.4 has a bad bug where it won't if any of the shards on the node have been rolled over. This documents that in the known issues for 6.4.0 and links to the fix in the bug fixes section of 6.4.1. Relates to elastic#33394

6.4 has a bad bug where it won't if any of the shards on the node have been rolled over. This documents that in the known issues for 6.4.0 and links to the fix in the bug fixes section of 6.4.1. Relates to #33394

cumtwwei · 2019-04-02T13:18:39Z

nice job！
Thanks

nik9000 added >bug blocker WIP :Data Management/Indices APIs DO NOT USE. Use ":Distributed/Indices APIs" or ":StorageEngine/Templates" instead. v7.0.0 v6.5.0 v6.4.1 labels Sep 4, 2018

nik9000 commented Sep 4, 2018

View reviewed changes

nik9000 added review team-discuss and removed WIP labels Sep 4, 2018

talevy self-requested a review September 6, 2018 00:49

nik9000 removed the team-discuss label Sep 6, 2018

nik9000 requested a review from bleskes September 6, 2018 19:43

bleskes approved these changes Sep 6, 2018

View reviewed changes

talevy removed their request for review September 6, 2018 20:44

nik9000 merged commit 0d45752 into elastic:master Sep 6, 2018

nik9000 added the backport pending label Sep 6, 2018

nik9000 removed the backport pending label Sep 7, 2018

nik9000 mentioned this pull request Sep 13, 2018

Docs: Add a note about rollover issues #33679

Merged

colings86 added the v7.0.0-beta1 label Feb 7, 2019

colings86 removed the v7.0.0 label Feb 7, 2019

Conversation

nik9000 commented Sep 4, 2018

Uh oh!

elasticmachine commented Sep 4, 2018

Uh oh!

nik9000 commented Sep 4, 2018

Uh oh!

nik9000 commented Sep 4, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nik9000 commented Sep 5, 2018

Uh oh!

bleskes left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cumtwwei commented Apr 2, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants